241 research outputs found

    A Deep Generative Model of Vowel Formant Typology

    Full text link
    What makes some types of languages more probable than others? For instance, we know that almost all spoken languages contain the vowel phoneme /i/; why should that be? The field of linguistic typology seeks to answer these questions and, thereby, divine the mechanisms that underlie human language. In our work, we tackle the problem of vowel system typology, i.e., we propose a generative probability model of which vowels a language contains. In contrast to previous work, we work directly with the acoustic information -- the first two formant values -- rather than modeling discrete sets of phonemic symbols (IPA). We develop a novel generative probability model and report results based on a corpus of 233 languages.Comment: NAACL 201

    One-Shot Neural Cross-Lingual Transfer for Paradigm Completion

    Full text link
    We present a novel cross-lingual transfer method for paradigm completion, the task of mapping a lemma to its inflected forms, using a neural encoder-decoder model, the state of the art for the monolingual task. We use labeled data from a high-resource language to increase performance on a low-resource language. In experiments on 21 language pairs from four different language families, we obtain up to 58% higher accuracy than without transfer and show that even zero-shot and one-shot learning are possible. We further find that the degree of language relatedness strongly influences the ability to transfer morphological knowledge.Comment: Accepted at ACL 201

    Context-Aware Prediction of Derivational Word-forms

    Full text link
    Derivational morphology is a fundamental and complex characteristic of language. In this paper we propose the new task of predicting the derivational form of a given base-form lemma that is appropriate for a given context. We present an encoder--decoder style neural network to produce a derived form character-by-character, based on its corresponding character-level representation of the base form and the context. We demonstrate that our model is able to generate valid context-sensitive derivations from known base forms, but is less accurate under a lexicon agnostic setting

    A Fast Algorithm for Computing Prefix Probabilities

    Full text link
    Multiple algorithms are known for efficiently calculating the prefix probability of a string under a probabilistic context-free grammar (PCFG). Good algorithms for the problem have a runtime cubic in the length of the input string. However, some proposed algorithms are suboptimal with respect to the size of the grammar. This paper proposes a novel speed-up of Jelinek and Lafferty's (1991) algorithm, which runs in O(N3∣N∣3+∣N∣4)\mathcal{O}({N^3 |\mathcal{N}|^3 + |\mathcal{N}|^4}), where NN is the input length and ∣N∣|\mathcal{N}| is the number of non-terminals in the grammar. In contrast, our speed-up runs in O(N2∣N∣3+N3∣N∣2)\mathcal{O}({N^2 |\mathcal{N}|^3+N^3|\mathcal{N}|^2}).Comment: To be published in the Proceedings of ACL 202
    • …
    corecore